Skip to content

feat(hpc): LazyLock frozen SIMD dispatch table — detect once, keep CPU choice forever#38

Merged
AdaWorldAPI merged 1 commit into
masterfrom
claude/transcode-deepnsm-rust-oNa1Z
Mar 28, 2026
Merged

feat(hpc): LazyLock frozen SIMD dispatch table — detect once, keep CPU choice forever#38
AdaWorldAPI merged 1 commit into
masterfrom
claude/transcode-deepnsm-rust-oNa1Z

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

feat(hpc): LazyLock frozen SIMD dispatch table — detect once, keep CPU choice forever

simd_dispatch.rs (300+ lines, 7 tests):

SimdDispatch: struct of function pointers, frozen at first access via LazyLock.
Each field is a fn pointer to the best available implementation for this CPU.
After initialization: one pointer deref + one indirect call. Zero branching.

SimdTier enum: Avx512 / Avx2 / Sse2 / Scalar / WasmSimd128 (future).
Selected once based on simd_caps() detection. Frozen forever.

Before: if simd_caps().avx512f { avx512_fn() } else { scalar_fn() } → ~1ns + branch
After: (SIMD_DISPATCH.fn_ptr)(args) → ~0.3ns, no branch

Dispatch targets (6 free functions across 4 modules):
byte_scan: byte_find_all, byte_count (AVX-512 / AVX2 / scalar)
distance: squared_distances_f32 (AVX2 / scalar)
nibble: nibble_unpack, nibble_above_threshold (AVX2 / scalar)
spatial_hash: batch_sq_dist (AVX2 / scalar)

NOTE: aabb.rs and cam_pq.rs dispatch on &self methods (not free functions)
so they keep inline simd_caps() branching. The dispatch table covers
the free function hot paths.

Visibility: internal SIMD functions promoted from pub(super)/private
to pub(crate) so the dispatch table can reference them as fn pointers.

The 8 existing per-call dispatch sites in nibble/byte_scan/distance/
spatial_hash/aabb/cam_pq still work — the dispatch table is additive.
Consumers can migrate to simd_dispatch().fn_ptr() incrementally.

TODO (separate PR): Rust 1.94 stabilized safe #[target_feature] on
safe functions. The unsafe on SIMD functions is legacy debt that
should be removed. The dispatch wrappers currently bridge this with
SAFETY comments; once unsafe is removed, the wrappers simplify to
direct function pointer assignment.

https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7

…U choice forever

simd_dispatch.rs (300+ lines, 7 tests):

  SimdDispatch: struct of function pointers, frozen at first access via LazyLock.
  Each field is a fn pointer to the best available implementation for this CPU.
  After initialization: one pointer deref + one indirect call. Zero branching.

  SimdTier enum: Avx512 / Avx2 / Sse2 / Scalar / WasmSimd128 (future).
  Selected once based on simd_caps() detection. Frozen forever.

  Before: if simd_caps().avx512f { avx512_fn() } else { scalar_fn() }  → ~1ns + branch
  After:  (SIMD_DISPATCH.fn_ptr)(args)                                  → ~0.3ns, no branch

  Dispatch targets (6 free functions across 4 modules):
    byte_scan:    byte_find_all, byte_count (AVX-512 / AVX2 / scalar)
    distance:     squared_distances_f32 (AVX2 / scalar)
    nibble:       nibble_unpack, nibble_above_threshold (AVX2 / scalar)
    spatial_hash: batch_sq_dist (AVX2 / scalar)

  NOTE: aabb.rs and cam_pq.rs dispatch on &self methods (not free functions)
  so they keep inline simd_caps() branching. The dispatch table covers
  the free function hot paths.

  Visibility: internal SIMD functions promoted from pub(super)/private
  to pub(crate) so the dispatch table can reference them as fn pointers.

  The 8 existing per-call dispatch sites in nibble/byte_scan/distance/
  spatial_hash/aabb/cam_pq still work — the dispatch table is additive.
  Consumers can migrate to simd_dispatch().fn_ptr() incrementally.

  TODO (separate PR): Rust 1.94 stabilized safe #[target_feature] on
  safe functions. The `unsafe` on SIMD functions is legacy debt that
  should be removed. The dispatch wrappers currently bridge this with
  SAFETY comments; once unsafe is removed, the wrappers simplify to
  direct function pointer assignment.

https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
@AdaWorldAPI AdaWorldAPI merged commit 1825a79 into master Mar 28, 2026
4 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants